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1. Introduction 

Phylogenetic comparative methods are statistical methods for analyzing data 
of groups of related species, called comparative data in the ecology and evo- 
lution literature. Since the species are related by shared evolutionary history, 
it may not be reasonable to view such data as independent, identically dis- 
tributed realizations of the same stochastic process. Instead, information about 
the shared evolutionary history, explained by the phylogeny for the species, is 
often incorporated into the analysis. The importance of incorporating the phy- 
logenetic tree has been validated through various studies. For a partial list of 
such studies the reader may refer to [2, 3, 5, 8, 9, 11, 20] and references therein. 
While the phylogenetic tree describes the evolutionary relationship, the trait 
evolution of n species is considered as an n-tuple of random variables which 
evolve as continuous Markovian processes with statistical dependency described 
by the phylogenetic tree. For a species, let yt denote its trait value at time t, 
for instance the body and/or brain mass, and it is described via an appropriate 
stochastic process along the evolution. In this paper, as in [7], we assume that 
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the response trait evolves toward an optimum, 6. More precisely, this paper con- 
siders that yt is a solution of an Ornstein-Uhlenbeck (OU) stochastic differential 
equation (SDE) [4, 13, 15]. 

dyt = -aiiyt - 0)dt + <7ydW^, (1.1) 

where ai measures the rate of adaption toward the optimum 0, is a white 
noise with mean zero and appropriate covariance, and ay is the standard de- 
viation of the random change. Considering the OU dynamics of evolution as 
expressed in (1.1), we note that the deterministic part is responsible for a lin- 
early increasing pull of the trait toward the primary optimum, and the stochastic 
part expresses an indirect change. 

For the model as described in (1.1), the works [2. 7] considered that the 
optimal value of the trait, 0, is constant. The study in [9] relaxed this restriction 
by assuming that the optimum 6t at any point on the phylogeny is a linear 
function of some stochastic variable x which evolved according to a Brownian 
motion. 

In this paper, considering that the optimal trait, 9t, expressed via a linear 
regression on the predictor x, we assume that the predictor x evolves as an OU 
stochastic process. In other words, our phylogenetic model is given, now, via the 
following coupled equations. 

dyt = ~ai{yt - et{x))dt + UydW^ (1.2) 
dxt = -a2{xt - -i)dt + a^dW^ (1.3) 

where ai,a2 measure forces for pulling the trait y and x back to the their own 
optima, 6t{x),j, respectively; the white noise is correlated with the white 
noise Wf; 7 is considered constant and thus the covariance of traits does not 
change (see (2.14)). For this reason, without loss of generality, 7 will be ignored 
from our analysis. Expressing the trait evolution using (1.2) and (1.3) enables 
both the response and predictor evolved under the same type of Markovian 
processes while the Brownian motion considered in [9] is a special case for the 
predictor. 

The linear dependency of the optimal trait value 9t on the predictor x implies 
that 6t is also a solution of an appropriate OU stochastic differential equation. 
Having described the trait value and its corresponding optimal as solutions 
of two appropriate OU SDKs, we establish the evolutionary regression curve 
between the predictor xt and the trait yt (see Theorem 2.1). The key step for 
finding the regression curve is Proposition 2.1 which demonstrates the regression 
estimates on trait values, yt, given the optimal 9t. To estimate the regression 
parameters, generalized least squares estimates (GLS) [12] are used. The GLS 
estimates though depend on several parameters, e.g. the rate of adaptations and 
the variance of the trait values, which appear in the design and variance matrices 
of the regression. Overpassing this hurdle, we develop an algorithm different 
from [9] which identifies the maximum likelihood estimators by using Powell's 
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method [17, 18]. Briefly, Powell's method relies on derivative-free univariate line 
optimization techniques, making the computations feasible. 

Our methodology is then applied to an ecological data set of the evolution 
of the woodcreepers (see Figure 4). As woodcreepers use their tail for body 
support, we study how the tip width (the width of the rachis at the base of the 
medial rectrix) would adapt to the base width (the width of the rachis at the 
tip of the rectrix). Given this data set, our model is compared with the recent 
regression methods established in [9] . 

Our paper is organized in the following way. Section 2 describes the establish- 
ment of the OU regression curves. Section 3 provides an algorithmic implemen- 
tation for estimating the various parameters of the regression curves of Section 
2. and Section 4 considers an ecological data set of woodcreepers as a showcase. 
Finally, Section 5 offers a discussion on the subject. 



2. Methodology 

Let the response trait at time t, yt, be a solution of the SDE as in (1.2). Let 
further assume that the optimum is linearly changing according to the predictor 
X, i.e. Ot = + biXt- The predictor variable Xt is presumed to be a solution 
of an OU stochastic differential equation given in (1.3). The linear relationship 
between the optimum, 9t, and the predictor, xt, implies that 9t will be a solution 
of an OU process. At this point, we will write 9t = 9t {x) by suppressing the 
variable x. Let consider 9t be a solution of the SDE below. 

d9t ^ -ai9tdt + CTedWf , (2.1) 

where Wf is correlated with the Brownian motion, Wf , considered in (1.2), 
measures the force of adaptation to the optimum, and ae = hiax measures the 
magnitude of stochastic perturbation of 9t. Without loss of generality we will 
assume that the rates of adaptation in (1.2) and (2.1), respectively, are equal, 
i.e. ai = as = a. This assumption is not essential and it is made just to simplify 
the established formulas of the moments of the trait, j/t, and the optimum, 9t, 
verified in Lemmas 2.1-2.4. 

Theorem 2.1 is the central theorem of this paper and establishes theoretically 
the evolutionary regression curve under the assumption that the trait and the 
predictor are solutions of OU-type SDEs. 

Theorem 2.1. (Evolutionary Regression Curve) Let assume that the trait value 
yt and its predictor are solutions of the coupled stochastic differential equations 
defined in (1.2) and (1-3), respectively. Let further assume that the optimum 
process 9t is related to the predictor xt via a linear regression of the form, 9t = 
bo + biXt- Then the evolutionary regression curve of the trait yt on xt is given 

by, 

E{yt\xt) - q{at) + p{at)bi{xt - Xa), (2.2) 

where for t > a, Xa is the ancestral value of x, p{at) ~ ^~^^^2(i^°2p(-*2at))~^"*^ ' 
and q{at) ~ (6o + biXa){y — exp(— at)) + yaexp(— at). 
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The proof of Theorem 2.1 is based on the foUowing proposition. 

Proposition 2.1. Let consider the stochastic differential equations, (1.2) and 
(2.1), assuming that ai = as = a. Let yt,0t be their corresponding solutions 
with initial conditions 9o, yo, respectively. The regression of the trait on the 
optimum is given via E,[yt\9t] = /3o + PiOt, where 

a _ 1 - exp(-2at) - atexp{-2at) 

^ 2(1 - exp(-2ai)) ^ ' 

$0 = a9ot cxp{— at) + yo exp(— at) — f3i9o (2.4) 

Proof of Theorem 2.1. We have shown in Proposition 2.1 that the regression 
hne 

E[y,m = a0otcxp(-at)+yoexp(-at)+ ^'"^P|^^_"^^p^^'^^^^^^ 

(2.5) 

We substitute the optimal value 9t in (2.5) with its linear dependency on xt, i.e. 
bo + biXt. Then, 

lS,[yt\xt] = a{bo + biXa)texp{-at) + yaexp(-at) 
1 — exp(— 2at) — atexp(— 2at) 



2(1 - exp(-2at)) 



bi{xt - Xa), 



where Xa, ya are the ancestral values of xt and j/t, respectively. From there it is 
clear what q{at) and p{at) arc. ■ 

The proof of the Proposition 2.1 is a direct application of the definition of the 
regression line. For this reason the moments of the trait, yt, and the correspond- 
ing optimum, 9t need to be established. Lemma 2.1 and Lemma 2.3 demonstrate 
the first and second moments of the optimum 9t, and the corresponding ones 
of the trait, yt, respectively. Also, the expected value of their product is given 
in Lemma 2.2. Their proofs blend ordinary differential equations derivations 
together with stochastic analysis, and they can be found to the appendix. 

Lemma 2.1. Let consider the stochastic differential equation (2.1), assuming 
that as = a. Let 9t be its corresponding solution, i.e. an OU process, with initial 
condition 9o. Then, 

E[9t] = 9o exp(-at) (2.6) 

2 

E[9^] - exp(-2at)) + 6igcxp(-2at) (2.7) 

Lemma 2.2. Let consider the stochastic differential equations, (1.2) and (2.1), 
assuming that ai = as = a. Let yt,9t be their corresponding solutions with 
initial conditions 9o,yo- Then, 

2 2 

^[y*^*] = + (J/"^" ~ ^)exp(-2at) + {9la - a2/2)t exp(-2at) (2.8) 
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Lemma 2.3. Let consider the stochastic differential equation (1-2), assuming 
that ai = a. Let yt he its corresponding solution with initial conditions yQ. Then, 

E[?/t] = a9ot exp{—at) + yo exp{—at) (2.9) 

2 2 

Za 4a 

2 2 

+ (2a2;o^o - cr^/2)texp{-2at) + [y^ - + ^)] exp(-2at) (2.10) 

Za 4a 

Now, using the definitions of the variance and the covariance of a random 
variable, together with Lemma 2.1, Lemma 2.2, and Lemma 2.3, one establishes 
Lemma 2.4. 

Lemma 2.4. Let consider the stochastic differential equations, (1.2) and (2.1), 
assuming that ai = aa = a. Let yt,Ot be their corresponding solutions with 
initial conditions 9o,yo. Then, 

Varm=a f-'f-'^'^ (2.11) 
Za 

Cov[yue,] = ^2|l^£^gzM _ * exp(-2at)} (2.12) 

Var[y,] = (^ + ^)(1 - exp(-2at)) - ^2 exp(-2at) (2.13) 

To estimate the regression parameters, (2.3) and (2.4), we will use gener- 
alized least squares methods. For this we need to find the variance and the 
covariance of the residuals. Let = y^ — E[?/i|6'i] be the ith residual associated 
with the ith prediction from the regression curve as verified in Proposition 2.1. 
The covariance between traits can be obtained by utilizing the definition in [8] . 

Cov[yi,yj] = Cov[[E[yi\ya],E[yj\ya]], 

where ya is the ancestral value of the y. Let consider a species which at time ta 
the pair of the trait and its corresponding optimum was {ya,Oa). Furthermore, 
let assume that at time ta the species diverges, and it takes ti, tj time for the 
i, j offspring respectively to evolve until the current time. In this paper, we 
assume that ti = tj = where tij is the total time of the evolution of the 
i and j offsprings until current time. When we proceed with the phylogenetic 
data analysis, the time t is equal to tij for the processes yt, Of. Thus, the index 
t standing for the time will be suppressed. 

The conditional expectation of the trait yi given its ancestral value ya, V,[yi\ya], 
is given below. 

Ebi|ya] = aOa^ exp{-atij /2) + yaexp{-atij /2). 
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Then, the covariance between the traits of two species and yj is 

at 

Cov[yi,yj] = Var[^^ exp(-aiy/2)0a + cxp{-at,j /2)ya] 
= ^exp(-at,-)--'"P^-'"^''^"^ 



2a 



+ exp(-at,,)[(-^ + -^)(1 - exp(-2ata)) 
2a 4a 

2 ^aexp(-2ata) 
- (f^e ^ )(l + aia)J, (2.14) 

where is the branch length from the base of the phylogeny to the tip, and 
j/a, 9a are the ancestral values of the trait y and its corresponding optimal 9, 
respectively. The covariance between a pair of residuals, Cov{ri, Vj) is expressed 
as following 

Cov[r,, r,] = Cov[y, ~ E[2/,|0,], - E[y,|0,]] 

^ Cov[y,,y,] - Cov[y,,nyj\9,]] - Coz;[yj, E[2/,|ft,]] + Cov[ny,\9,],ny,\9j]] 

(2.15) 

The Cov[yi, yj] is given in (2.14), and the rest of the factors of equation in (2.15) 
are derived below. 



3. The algorithm 

Let consider the model y — + e. The variable y £ M"^^ represents the mean 
trait value of species, X S R"'^' is the design matrix whose second column is 
multiplied by p{at) as defined in Theorem 2.1, b S R"?^^ is the vector with q 
regressors, and the error structure e follows a normal distribution, e ^ N{0, V). 
Furthermore assuming that a is the maximum likelihood estimator (MLE) of 
a, one could estimate the MLEs for the mean and the variance of the predictor 
X via the following equations. 



E[x] = {l'mr'l'T7^x, 

{x~E[x]yT7\x-E[x]) 




where 1 G M" ^ ^ is a vector with all entries 1 ; Tq, is the covariance matrix under 
the OU process for the trait evolution, i.e. Ta[i, j] = exp(— 2a;^ij) i-p^pC-^"*") 
[2, 11]. In order to estimate the parameters a, cr^, we consider the log- likelihood 
function £ as following, 

77 1 1 

£ia, al) = -- log(2^) - - log(det(F)) -{y - Xh)V-\y - Xh). (3.1) 
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Model 


Method 


Regression Line 


r' 


AICc 


Ml: OUOU 


Sections 2 & 3 


y = 0.075 + 0.227X 


23.7 % 


-31.83 


M2: OUBM 


[9] 


y = 0.059 + 0.284a: 


22.2% 


-32.72 



Table 1 
Regression curves' comparison 



The regressors' vector b is estimated by 

We observe that the GLS estimates of regression parameters depend on the de- 
sign and covariance matrices, V, X, respectively; and in turn they depend on 
a, ay. Therefore, our algorithm starts with an ordinary least-squares estimate, 
bo ~ {X'^ X)~^X'^y, which is independent of the rate of adaptation, a, and 
standard deviation, ay. Then, the MLE pair {a, ay) is estimated through op- 
timizing ^ on a domain appropriate for (a, ay). The upper bound for a > is 
arbitrary. The variation of data should not exceed a range as much as the differ- 
ence between the maximum and the minimum observation, in other words we 
set the domain of ay to be [0, y(„) — After establishing the MLEs (a, ay), 
the regression estimator b is updated. The process is ongoing until the error 
measured, err = ||6 — 6o||, is less than some threshold 6. We describe our algo- 
rithm below: 

Step 1: Set bo = {X'^ X)-^X'^y. 

Step 2: Search the MLEs (q;,(J^) for £o{a,a'^) through Poweh's method [17]. 
Step 3: Calculate best = iX^V^^X)^^X^V~^y. 

Step 4: Set S = ||&est ~ ^o|| 

If 5 < err, return best 

else set bo — best, Go to Step 2. 

4. Data Analysis 

Using the algorithm explained in Section 3, we establish the evolutionary regres- 
sion curve for the woodcreepers data set of n = 39 species [22] . The explanatory 
variable x is considered the rachis width at the tip and the corresponding re- 
sponse, y, the rachis width at the base. We further consider two regressors 
{q = 2). We also set the threshold for the error, err, not to exceed 6 = 10~^ 
for identifying the convergent regressors. We also compare our method with 
the recent development of [9]. The regression output and the results from the 
regression comparison are given in the Table 1. Figure 4 displays the two evo- 
lutionary curves along with the data. The reader should remark at this point 
that for the woodcrecper data set, the r^ calculated using the algorithm of this 
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D. Bridgesii 

N. Longirostris 

I — D. Tyrannina 
~^ — D. Anabatina 

D. Stictolaema 

D. Longicauda 

S. Griseicapillus 

G. Spirurus 

X. Erythropygius 

J D. Certhia 
D. Hoffinannsi 
D. Concolor 
— D. Platyrosths 
' — D. Picumnus 
■H. Perrotii 



X. Albicollis 
X. Major 

X. Promeroplrhynchus 
C. Procurvoides 
■C. Pusillus 

C. Trochllirosths 

D. Rufigula 
X. Ocellatus 

X. Lachrymosus 
X. Obsoletus 
X. Spixii 
X. Triangularis 
X. Eytoni 
X. Elegans 
X. Picus 
X. Fiavigaster 
X. Pardaiotus 
X. Guttatus 
L. Fuscus 
L. Affinis 
L. Aibolineatus 
L Leucogaster 
L Angustiroslris 
L. Squamatus 



Fig 1. The phylogenetic tree of the 39 woodcreeper species as in [22] using the algorithm of 
construction of a phylogenetic tree from [16]. 



paper is slightly larger than the corresponding value of the model in [9] . This 
signifies an improvement on the fit of the data. On the other hand, the AICc 
value, [10, 21], of the OUOU model is lower than the corresponding AICc of the 
OUBM model, the difference between the two values is Aaicc — 0.38. [1] indi- 
cates that if the AICc values of two models differ no more than 2 units, then the 
model with the higher AICc within these two units should receive consideration 
for making inference. 

5. Conclusion 

A novel approach to the phylogenetic regression analysis was presented. Our 
method considered that the trait value of the species and the corresponding 
optimum evolve with OU-type dynamics. Our method was then implemented 
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Fig 2, Evolutionary regression curves for OUOU model (red dashed line) and BMOU (bl 
solid line). 
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algorithmically using generalized least squares methods and it was further ap- 
plied to a real ecological data set of the woodcreepers' evolution. The established 
regression curve demonstrated a better fit in comparison with the model of [9] . 
Our model can be further generalized, and perhaps approach more realistic sce- 
narios, by considering that the rate of adaptation is another stochastic process. 

Appendix 

Proof of Lemma 2.1. Consider expectations on both sides of the SDE (2.1). 

Given that the mean of the Brownian motion, , vanishes, i.e. E(W^^) = 0, we 
have the following ordinary differential equation (ODE), = — aE[0t] with 

initial condition E[6'(0)] = K[9o] = 6q. Its solution is obviously the right hand 
side (RHS) of (2.6). Next, applying Ito's formula; it yields d9'^ = {a^ -2ae'^)dt + 

2cr9dW^. After considering expectations, the ODE = cr^ - 2aE[9'f] with 

initial condition E[9'^{0)] = 91 leads to (2.7). ■ 

Proof of Lemma 2.2. The stochastic integration by parts formula [13] im- 
plies d{yt9t) = ytd9t + 9tdyt + dytd9t, where dytd9t = d[yt,9t] is the quadratic 
variation. Therefore, d{yt9t) = -a9t{2yt - 9t)dt + aytdW^ + a9tdW^ . Apply- 
ing expectations, and recalling that yt,dt are adapted processes, [13], one has 
= -2aE[yt9t]+aE[9^]. The solution of this ODE is the RHS of (2.8). ■ 

Proof of Lemma 2.3. Let apply to the SDE, (1.2), expectations and substitute 
the expected value of the optimum 9t as given in Proposition 2.1. One then gets 
the ODE + aE[yt] = a9oexp{~at) with initial condition E[y(0)] = yo 

whose solution is (2.9). Next, applying Ito's formula; it yields dy^ = {ay — 
2ayt{yt — 9t)}dt + 2aytdW^. After considering expectations and substituting the 
E[yt9t] according to Proposition 2.2, the ODE = cr^ „ 2aE[y^] + 2aE[yt9t], 

with initial condition E[?/^(0)] = j/q leads to (2.10). ■ 
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